124 research outputs found
Eigen-CAM: Class Activation Map using Principal Components
Deep neural networks are ubiquitous due to the ease of developing models and
their influence on other domains. At the heart of this progress is
convolutional neural networks (CNNs) that are capable of learning
representations or features given a set of data. Making sense of such complex
models (i.e., millions of parameters and hundreds of layers) remains
challenging for developers as well as the end-users. This is partially due to
the lack of tools or interfaces capable of providing interpretability and
transparency. A growing body of literature, for example, class activation map
(CAM), focuses on making sense of what a model learns from the data or why it
behaves poorly in a given task. This paper builds on previous ideas to cope
with the increasing demand for interpretable, robust, and transparent models.
Our approach provides a simpler and intuitive (or familiar) way of generating
CAM. The proposed Eigen-CAM computes and visualizes the principle components of
the learned features/representations from the convolutional layers. Empirical
studies were performed to compare the Eigen-CAM with the state-of-the-art
methods (such as Grad-CAM, Grad-CAM++, CNN-fixations) by evaluating on
benchmark datasets such as weakly-supervised localization and localizing
objects in the presence of adversarial noise. Eigen-CAM was found to be robust
against classification errors made by fully connected layers in CNNs, does not
rely on the backpropagation of gradients, class relevance score, maximum
activation locations, or any other form of weighting features. In addition, it
works with all CNN models without the need to modify layers or retrain models.
Empirical results show up to 12% improvement over the best method among the
methods compared on weakly supervised object localization.Comment: 7 pages, 4 figure
Robust Modeling of Epistemic Mental States
This work identifies and advances some research challenges in the analysis of
facial features and their temporal dynamics with epistemic mental states in
dyadic conversations. Epistemic states are: Agreement, Concentration,
Thoughtful, Certain, and Interest. In this paper, we perform a number of
statistical analyses and simulations to identify the relationship between
facial features and epistemic states. Non-linear relations are found to be more
prevalent, while temporal features derived from original facial features have
demonstrated a strong correlation with intensity changes. Then, we propose a
novel prediction framework that takes facial features and their nonlinear
relation scores as input and predict different epistemic states in videos. The
prediction of epistemic states is boosted when the classification of emotion
changing regions such as rising, falling, or steady-state are incorporated with
the temporal features. The proposed predictive models can predict the epistemic
states with significantly improved accuracy: correlation coefficient (CoERR)
for Agreement is 0.827, for Concentration 0.901, for Thoughtful 0.794, for
Certain 0.854, and for Interest 0.913.Comment: Accepted for Publication in Multimedia Tools and Application, Special
Issue: Socio-Affective Technologie
Learning Representations from EEG with Deep Recurrent-Convolutional Neural Networks
One of the challenges in modeling cognitive events from electroencephalogram
(EEG) data is finding representations that are invariant to inter- and
intra-subject differences, as well as to inherent noise associated with such
data. Herein, we propose a novel approach for learning such representations
from multi-channel EEG time-series, and demonstrate its advantages in the
context of mental load classification task. First, we transform EEG activities
into a sequence of topology-preserving multi-spectral images, as opposed to
standard EEG analysis techniques that ignore such spatial information. Next, we
train a deep recurrent-convolutional network inspired by state-of-the-art video
classification to learn robust representations from the sequence of images. The
proposed approach is designed to preserve the spatial, spectral, and temporal
structure of EEG which leads to finding features that are less sensitive to
variations and distortions within each dimension. Empirical evaluation on the
cognitive load classification task demonstrated significant improvements in
classification accuracy over current state-of-the-art approaches in this field.Comment: To be published as a conference paper at ICLR 201
Prosody Based Co-analysis for Continuous Recognition of Coverbal Gestures
Although speech and gesture recognition has been studied extensively, all the
successful attempts of combining them in the unified framework were
semantically motivated, e.g., keyword-gesture cooccurrence. Such formulations
inherited the complexity of natural language processing. This paper presents a
Bayesian formulation that uses a phenomenon of gesture and speech articulation
for improving accuracy of automatic recognition of continuous coverbal
gestures. The prosodic features from the speech signal were coanalyzed with the
visual signal to learn the prior probability of co-occurrence of the prominent
spoken segments with the particular kinematical phases of gestures. It was
found that the above co-analysis helps in detecting and disambiguating visually
small gestures, which subsequently improves the rate of continuous gesture
recognition. The efficacy of the proposed approach was demonstrated on a large
database collected from the weather channel broadcast. This formulation opens
new avenues for bottom-up frameworks of multimodal integration.Comment: Alternative see:
http://vision.cse.psu.edu/kettebek/academ/publications.ht
Semantically linking and browsing PubMed abstracts with gene ontology
<p>Abstract</p> <p>Background</p> <p>The technological advances in the past decade have lead to massive progress in the field of biotechnology. The documentation of the progress made exists in the form of research articles. The PubMed is the current most used repository for bio-literature. PubMed consists of about 17 million abstracts as of 2007 that require methods to efficiently retrieve and browse large volume of relevant information. The State-of-the-art technologies such as GOPubmed use simple keyword-based techniques for retrieving abstracts from the PubMed and linking them to the Gene Ontology (GO). This paper changes the paradigm by introducing semantics enabled technique to link the PubMed to the Gene Ontology, called, SEGOPubmed for ontology-based browsing. Latent Semantic Analysis (LSA) framework is used to semantically interface PubMed abstracts to the Gene Ontology.</p> <p>Results</p> <p>The Empirical analysis is performed to compare the performance of the SEGOPubmed with the GOPubmed. The analysis is initially performed using a few well-referenced query words. Further, statistical analysis is performed using GO curated dataset as ground truth. The analysis suggests that the SEGOPubmed performs better than the classic GOPubmed as it incorporates semantics.</p> <p>Conclusions</p> <p>The LSA technique is applied on the PubMed abstracts obtained based on the user query and the semantic similarity between the query and the abstracts. The analyses using well-referenced keywords show that the proposed semantic-sensitive technique outperformed the string comparison based techniques in associating the relevant abstracts to the GO terms. The SEGOPubmed also extracted the abstracts in which the keywords do not appear in isolation (i.e. they appear in combination with other terms) that could not be retrieved by simple term matching techniques.</p
- ā¦